Communication-optimal Parallel and Sequential Cholesky Decomposition
نویسندگان
چکیده
Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [13, 14, 15], this gives a set of communication-optimal algorithms for O(n3) implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.
منابع مشابه
Evaluating Quasi-Monte Carlo (QMC) algorithms in blocks decomposition of de-trended
The length of equal minimal and maximal blocks has eected on logarithm-scale logarithm against sequential function on variance and bias of de-trended uctuation analysis, by using Quasi Monte Carlo(QMC) simulation and Cholesky decompositions, minimal block couple and maximal are founded which are minimum the summation of mean error square in Horest power.
متن کاملEfficient algorithms for estimating the general linear model
Computationally efficient serial and parallel algorithms for estimating the general linear model are proposed. The sequential block-recursive algorithm is an adaptation of a known Givens strategy that has as a main component the Generalized QR decomposition. The proposed algorithm is based on orthogonal transformations and exploits the triangular structure of the Cholesky QRD factor of the vari...
متن کاملCommunication Avoiding (CA) and Other Innovative Algorithms
In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it ...
متن کاملMinimizing Communication in Numerical Linear Algebra
In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it...
متن کاملMinimizing Communication in Linear Algebra
In 1981 Hong and Kung [HK81] proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin [ITT04] gave a new proof of this result a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Scientific Computing
دوره 32 شماره
صفحات -
تاریخ انتشار 2010